# load the package
library("terra")
In the first step, we need to create a list of files (rasters) that
we are going to load. To do this, we can use the
list.files() function, which takes a path to a folder with
files as an argument. In addition, we must indicate what kind of files
we want to load (pattern = "\\.TIF$") and return full paths
to the files (full.names = TRUE).
# listing files from a directory
files = list.files("dane_cut/", pattern = "\\.TIF$", full.names = TRUE)
files
## [1] "dane_cut/LC08_L2SP_191023_20230605_20230613_02_T1_SR_B1.TIF"
## [2] "dane_cut/LC08_L2SP_191023_20230605_20230613_02_T1_SR_B2.TIF"
## [3] "dane_cut/LC08_L2SP_191023_20230605_20230613_02_T1_SR_B3.TIF"
## [4] "dane_cut/LC08_L2SP_191023_20230605_20230613_02_T1_SR_B4.TIF"
## [5] "dane_cut/LC08_L2SP_191023_20230605_20230613_02_T1_SR_B5.TIF"
## [6] "dane_cut/LC08_L2SP_191023_20230605_20230613_02_T1_SR_B6.TIF"
## [7] "dane_cut/LC08_L2SP_191023_20230605_20230613_02_T1_SR_B7.TIF"
Once we have created a list of files, we can load them using the
rast() function from the terra package and
then display the metadata.
# load raster data
landsat = rast(files)
landsat # calling the object displays the metadata
## class : SpatRaster
## dimensions : 2360, 2469, 7 (nrow, ncol, nlyr)
## resolution : 30, 30 (x, y)
## extent : 594915, 668985, 5777865, 5848665 (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 33N (EPSG:32633)
## sources : LC08_L2SP_191023_20230605_20230613_02_T1_SR_B1.TIF
## LC08_L2SP_191023_20230605_20230613_02_T1_SR_B2.TIF
## LC08_L2SP_191023_20230605_20230613_02_T1_SR_B3.TIF
## ... and 4 more source(s)
## names : LC08_~SR_B1, LC08_~SR_B2, LC08_~SR_B3, LC08_~SR_B4, LC08_~SR_B5, LC08_~SR_B6, ...
## min values : 31, 28, 2739, 2745, 7413, 6857, ...
## max values : 36332, 34628, 36618, 32237, 50996, 61014, ...
We can also shorten or rename the spectral bands. Before this, make sure that the bands are loaded in the correct order.
names(landsat) # original names
## [1] "LC08_L2SP_191023_20230605_20230613_02_T1_SR_B1"
## [2] "LC08_L2SP_191023_20230605_20230613_02_T1_SR_B2"
## [3] "LC08_L2SP_191023_20230605_20230613_02_T1_SR_B3"
## [4] "LC08_L2SP_191023_20230605_20230613_02_T1_SR_B4"
## [5] "LC08_L2SP_191023_20230605_20230613_02_T1_SR_B5"
## [6] "LC08_L2SP_191023_20230605_20230613_02_T1_SR_B6"
## [7] "LC08_L2SP_191023_20230605_20230613_02_T1_SR_B7"
names(landsat) = paste0("B", 1:7) # shorten the names
names(landsat) # new names
## [1] "B1" "B2" "B3" "B4" "B5" "B6" "B7"
# or alternatively rename
# names(landsat) = c("Ultra Blue", "Blue", "Green", "Red", "NIR", "SWIR1", "SWIR2")
Loading vector data is done in an analogous way using the
vect() function.
# load vector data
poly = vect("Poznan.gpkg")
poly
## class : SpatVector
## geometry : polygons
## dimensions : 1, 0 (geometries, attributes)
## extent : 326611.5, 391121.8, 477976.1, 536838.4 (xmin, xmax, ymin, ymax)
## source : Poznan.gpkg
## coord. ref. : ETRF2000-PL / CS92 (EPSG:2180)
As we can see from the metadata, raster and vector data have
different coordinate reference systems (CRS), which is troublesome. The
easiest way is to transform the vector data into a raster’s CRS and we
can do it with the project() function and specifying the
EPSG code.
poly = project(poly, "EPSG:32633")
crs(poly, describe = TRUE)$code # check EPSG after transformation
## [1] "32633"
Now we can prepare a simple visualization using the near infrared band (NIR; B5) and polygon as an example.
# visualization
plot(landsat[[5]], main = "Poznan commune") # alternatively: plot(landsat[["B5"]])
plot(poly, add = TRUE)
The extent of our analysis area is limited to the Poznan commune,
while the satellite scene has a much larger extent. In such a situation,
we can crop the rasters, so that their further processing will be faster
and the results will take up less space on disk. The crop()
function is used to crop the rasters, and we need to specify the raster
and vector as arguments.
Note that the rasters are represented as matrices, so the cropping is
done to the bounding box. To include only the area of our polygon in the
analysis, we need to mask the pixels outside the boundary. This can be
done by setting the mask = TRUE argument in the
aforementioned function.
landsat = crop(landsat, poly, mask = TRUE)
plot(landsat[[5]], main = "Poznan commune")
plot(poly, add = TRUE)
In the next step, we can easily check the descriptive statistics of our dataset.
summary(landsat)
## B1 B2 B3 B4
## Min. : 4095 Min. : 1959 Min. : 8075 Min. : 7655
## 1st Qu.: 7813 1st Qu.: 8037 1st Qu.: 8987 1st Qu.: 8408
## Median : 8176 Median : 8429 Median : 9659 Median : 9170
## Mean : 8464 Mean : 8771 Mean : 9901 Mean : 9706
## 3rd Qu.: 8879 3rd Qu.: 9229 3rd Qu.:10408 3rd Qu.:10499
## Max. :23937 Max. :25640 Max. :28745 Max. :30744
## NA's :42589 NA's :42588 NA's :42588 NA's :42588
## B5 B6 B7
## Min. : 7413 Min. : 7447 Min. : 7406
## 1st Qu.:16791 1st Qu.:11743 1st Qu.: 9318
## Median :18907 Median :13078 Median :10541
## Mean :18980 Mean :13783 Mean :11553
## 3rd Qu.:21018 3rd Qu.:15265 3rd Qu.:12847
## Max. :33376 Max. :34485 Max. :37960
## NA's :42588 NA's :42588 NA's :42588
As we can see, the spectral reflectance values are several thousand for each band and are encoded as integers. The spectral reflectance should be in the range from 0 to 1. Therefore, we need to scale our data using the following equation (this only applies to reflectance, not temperature!):
\[x = x \cdot 0.0000275 - 0.2\]
For example, the pixel value in the near infrared (NIR) band is 15000. Using the above formula, we need to multiply this value by 0.0000275 (scale factor) and then subtract 0.2 (offset). As a result, we will get a reflection equal to 0.2125. Note that each product/collection has a different formula and it is necessary to consult the documentation.
We don’t need to apply this formula separately for each band in the loop because the arithmetic operations in the terra package is applied to all bands by default.
landsat = landsat * 2.75e-05 - 0.2
summary(landsat)
## B1 B2 B3 B4
## Min. :-0.09 Min. :-0.15 Min. :0.02 Min. :0.01
## 1st Qu.: 0.01 1st Qu.: 0.02 1st Qu.:0.05 1st Qu.:0.03
## Median : 0.02 Median : 0.03 Median :0.07 Median :0.05
## Mean : 0.03 Mean : 0.04 Mean :0.07 Mean :0.07
## 3rd Qu.: 0.04 3rd Qu.: 0.05 3rd Qu.:0.09 3rd Qu.:0.09
## Max. : 0.46 Max. : 0.51 Max. :0.59 Max. :0.65
## NA's :42589 NA's :42588 NA's :42588 NA's :42588
## B5 B6 B7
## Min. :0.00 Min. :0.00 Min. :0.00
## 1st Qu.:0.26 1st Qu.:0.12 1st Qu.:0.06
## Median :0.32 Median :0.16 Median :0.09
## Mean :0.32 Mean :0.18 Mean :0.12
## 3rd Qu.:0.38 3rd Qu.:0.22 3rd Qu.:0.15
## Max. :0.72 Max. :0.75 Max. :0.84
## NA's :42588 NA's :42588 NA's :42588
We can still see that some values exceed the range of 0 to 1. These are outliers that are usually associated with incorrect measurement or oversaturation. We can solve this problem in two ways:
The first way can cause us to lose a large part of the dataset. The second way, on the other hand, can cause skewed results.
# method 1
landsat[landsat < 0] = NA
landsat[landsat > 1] = NA
# method 2
landsat[landsat < 0] = 0
landsat[landsat > 1] = 1
After scaling the values, we can display the RGB composition. In this
case, instead of plot() function, use
plotRGB() function and define the order of red, green and
blue bands. In addition, we need to specify the maximum reflection value
for the bands (in our case scale = 1). It often happens
that compositions are too dark/bright, then is helpful to apply color
stretching using the stretch = "lin" or
stretch = "hist" argument.
# plotRGB(landsat, r = 4, g = 3, b = 2, scale = 1)
plotRGB(landsat, r = 4, g = 3, b = 2, scale = 1, stretch = "lin")
# load clustering package
library("cluster")
Data for modeling must be prepared in an appropriate way.
Classification models most often require a matrix or data frame at the
training stage. Raster data can be transformed into a matrix using the
values() function (or alternatively coercing by
as.matrix()).
mat = values(landsat)
nrow(mat) # print number of rows/pixels
## [1] 4182465